Training learnable metrics using modern language models has recently emerged as a promising method for the automatic evaluation of machine translation. However, existing human evaluation datasets in text simplification are limited by a lack of annotations, unitary simplification types, and outdated models, making them unsuitable for this approach. To address these issues, we introduce the SIMPEVAL corpus that contains: SIMPEVAL_ASSET, comprising 12K human ratings on 2.4K simplifications of 24 systems, and SIMPEVAL_2022, a challenging simplification benchmark consisting of over 1K human ratings of 360 simplifications including generations from GPT-3.5. Training on SIMPEVAL_ASSET, we present LENS, a Learnable Evaluation Metric for Text Simplification. Extensive empirical results show that LENS correlates better with human judgment than existing metrics, paving the way for future progress in the evaluation of text simplification. To create the SIMPEVAL datasets, we introduce RANK & RATE, a human evaluation framework that rates simplifications from several models in a list-wise manner by leveraging an interactive interface, which ensures both consistency and accuracy in the evaluation process. Our metric, dataset, and annotation toolkit are available at https://github.com/Yao-Dou/LENS.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recent studies have found that pain in infancy has a significant impact on infant development, including psychological problems, possible brain injury, and pain sensitivity in adulthood. However, due to the lack of specialists and the fact that infants are unable to express verbally their experience of pain, it is difficult to assess infant pain. Most existing infant pain assessment systems directly apply adult methods to infants ignoring the differences between infant expressions and adult expressions. Meanwhile, as the study of facial action coding system continues to advance, the use of action units (AUs) opens up new possibilities for expression recognition and pain assessment. In this paper, a novel AuE-IPA method is proposed for assessing infant pain by leveraging different engagement levels of AUs. First, different engagement levels of AUs in infant pain are revealed, by analyzing the class activation map of an end-to-end pain assessment model. The intensities of top-engaged AUs are then used in a regression model for achieving automatic infant pain assessment. The model proposed is trained and experimented on YouTube Immunization dataset, YouTube Blood Test dataset, and iCOPEVid dataset. The experimental results show that our AuE-IPA method is more applicable to infants and possesses stronger generalization ability than end-to-end assessment model and the classic PSPI metric.
translated by 谷歌翻译
Physics-informed neural networks (PINNs) have lately received significant attention as a representative deep learning-based technique for solving partial differential equations (PDEs). Most fully connected network-based PINNs use automatic differentiation to construct loss functions that suffer from slow convergence and difficult boundary enforcement. In addition, although convolutional neural network (CNN)-based PINNs can significantly improve training efficiency, CNNs have difficulty in dealing with irregular geometries with unstructured meshes. Therefore, we propose a novel framework based on graph neural networks (GNNs) and radial basis function finite difference (RBF-FD). We introduce GNNs into physics-informed learning to better handle irregular domains with unstructured meshes. RBF-FD is used to construct a high-precision difference format of the differential equations to guide model training. Finally, we perform numerical experiments on Poisson and wave equations on irregular domains. We illustrate the generalizability, accuracy, and efficiency of the proposed algorithms on different PDE parameters, numbers of collection points, and several types of RBFs.
translated by 谷歌翻译
We develop an online kernel Cumulative Sum (CUSUM) procedure, which consists of a parallel set of kernel statistics with different window sizes to account for the unknown change-point location. Compared with many existing sliding window-based kernel change-point detection procedures, which correspond to the Shewhart chart-type procedure, the proposed procedure is more sensitive to small changes. We further present a recursive computation of detection statistics, which is crucial for online procedures to achieve a constant computational and memory complexity, such that we do not need to calculate and remember the entire Gram matrix, which can be a computational bottleneck otherwise. We obtain precise analytic approximations of the two fundamental performance metrics, the Average Run Length (ARL) and Expected Detection Delay (EDD). Furthermore, we establish the optimal window size on the order of $\log ({\rm ARL})$ such that there is nearly no power loss compared with an oracle procedure, which is analogous to the classic result for window-limited Generalized Likelihood Ratio (GLR) procedure. We present extensive numerical experiments to validate our theoretical results and the competitive performance of the proposed method.
translated by 谷歌翻译
The task of Compositional Zero-Shot Learning (CZSL) is to recognize images of novel state-object compositions that are absent during the training stage. Previous methods of learning compositional embedding have shown effectiveness in closed-world CZSL. However, in Open-World CZSL (OW-CZSL), their performance tends to degrade significantly due to the large cardinality of possible compositions. Some recent works separately predict simple primitives (i.e., states and objects) to reduce cardinality. However, they consider simple primitives as independent probability distributions, ignoring the heavy dependence between states, objects, and compositions. In this paper, we model the dependence of compositions via feasibility and contextuality. Feasibility-dependence refers to the unequal feasibility relations between simple primitives, e.g., \textit{hairy} is more feasible with \textit{cat} than with \textit{building} in the real world. Contextuality-dependence represents the contextual variance in images, e.g., \textit{cat} shows diverse appearances under the state of \textit{dry} and \textit{wet}. We design Semantic Attention (SA) and generative Knowledge Disentanglement (KD) to learn the dependence of feasibility and contextuality, respectively. SA captures semantics in compositions to alleviate impossible predictions, driven by the visual similarity between simple primitives. KD disentangles images into unbiased feature representations, easing contextual bias in predictions. Moreover, we complement the current compositional probability model with feasibility and contextuality in a compatible format. Finally, we conduct comprehensive experiments to analyze and validate the superior or competitive performance of our model, Semantic Attention and knowledge Disentanglement guided Simple Primitives (SAD-SP), on three widely-used benchmark OW-CZSL datasets.
translated by 谷歌翻译
This paper addresses the quality issues in existing Twitter-based paraphrase datasets, and discusses the necessity of using two separate definitions of paraphrase for identification and generation tasks. We present a new Multi-Topic Paraphrase in Twitter (MultiPIT) corpus that consists of a total of 130k sentence pairs with crowdsoursing (MultiPIT_crowd) and expert (MultiPIT_expert) annotations using two different paraphrase definitions for paraphrase identification, in addition to a multi-reference test set (MultiPIT_NMR) and a large automatically constructed training set (MultiPIT_Auto) for paraphrase generation. With improved data annotation quality and task-specific paraphrase definition, the best pre-trained language model fine-tuned on our dataset achieves the state-of-the-art performance of 84.2 F1 for automatic paraphrase identification. Furthermore, our empirical results also demonstrate that the paraphrase generation models trained on MultiPIT_Auto generate more diverse and high-quality paraphrases compared to their counterparts fine-tuned on other corpora such as Quora, MSCOCO, and ParaNMT.
translated by 谷歌翻译
开放世界对象检测是一个更具笼统和挑战性的目标,旨在识别和本地化由任意类别名称描述的对象。最近的工作GLIP通过将检测数据集的所有类别名称连接到句子中,从而将此问题作为接地问题,从而导致类别名称之间的效率低下的相互作用。本文介绍了Distclip,这是一种通过诉诸于设计概念词典的知识富集,是一种平行的视觉概念训练预训练方法,用于开放世界检测。为了提高学习效率,我们提出了一种新型的并行概念公式,该公式分别提取概念,以更好地利用异质数据集(即检测,接地和图像文本对)进行培训。我们进一步设计了来自各种在线资源和检测数据集的概念字典〜(带有描述),以提供每个概念的先验知识。通过用描述丰富这些概念,我们明确地建立了各种概念之间的关系,以促进开放域学习。所提出的概念词典进一步用于提供足够的负面概念,用于构建单词区域对齐损失\,并完成图像对文本对数据标题中缺少描述的对象的标签。所提出的框架显示出强烈的零射击性能性能,例如,在LVIS数据集上,我们的DETCLIP-T优于9.9%的地图GLIPT-T优于GLIP-T,并且与完全避免的型号相比,稀有类别的稀有类别提高了13.5%。作为我们的。
translated by 谷歌翻译
现代医疗保健系统正在对电子病历(EMR)进行连续自动监视,以识别频率越来越多的不良事件;但是,许多败血症等事件都没有明确阐明前瞻性(即事件链),可用于识别和拦截它的早期不良事件。目前,尚无可靠的框架来发现或描述不良医院事件之前的因果链。临床上相关和可解释的结果需要一个框架,可以(1)推断在EMR数据中发现的多个患者特征(例如,实验室,生命体征等)中的时间相互作用,并且(2)可以识别(s)的模式(s)。到即将发生的不良事件(例如,败血症)。在这项工作中,我们提出了一个线性多元霍克斯进程模型,并与$ g(x)= x^+$链接函数结合起来允许潜在的抑制作用,以恢复Granger Causal(GC)图。我们开发了一个基于两阶段的方案,以最大程度地提高可能性的替代品以估计问题参数。该两相算法可扩展,并通过我们的数值模拟显示有效。随后将其扩展到佐治亚州亚特兰大的Grady医院系统的患者数据集,在那里,合适的Granger Causal图识别出败血症之前的几个高度可解释的链。
translated by 谷歌翻译
热应力和变形的快速分析在热控制措施和卫星结构设计的优化中起着关键作用。为了实现卫星主板的实时热应力和热变形分析,本文提出了一种新型的多任务注意UNET(MTA-UNET)神经网络,将多任务学习(MTL)和U-NET的优势结合在一起注意机制。此外,在训练过程中使用了物理知识的策略,其中部分微分方程(PDE)被整合到损失函数中作为残留项。最后,将基于不确定性的损失平衡方法应用于重量的多个培训任务的不同损失功能。实验结果表明,与单任务学习(STL)模型相比,提出的MTA-UNET有效提高了多个物理任务的预测准确性。此外,物理信息的方法在每个任务的预测中的错误较小,尤其是在小型数据集上。代码可以在:\ url {https://github.com/komorebitso/mta-unet}下载。
translated by 谷歌翻译